TRAINABILITY OF ReLU NETWORKS AND DATA-DEPENDENT INITIALIZATION
نویسندگان
چکیده
منابع مشابه
Trainability in Recurrent Neural Networks
Two potential bottlenecks on the expressiveness of recurrent neural networks (RNNs) are their ability to store information about the task in their parameters, and to store information about the input history in their units. We show experimentally that all common RNN architectures achieve nearly the same per-task and per-unit capacity bounds with careful training, for a variety of tasks and stac...
متن کاملCapacity and Trainability in Recurrent Neural Networks
Two potential bottlenecks on the expressiveness of recurrent neural networks (RNNs) are their ability to store information about the task in their parameters, and to store information about the input history in their units. We show experimentally that all common RNN architectures achieve nearly the same per-task and per-unit capacity bounds with careful training, for a variety of tasks and stac...
متن کاملThe Multilinear Structure of ReLU Networks
We study the loss surface of neural networks equipped with a hinge loss criterion and ReLU or leaky ReLU nonlinearities. Any such network defines a piecewise multilinear form in parameter space, and as a consequence, optima of such networks generically occur in non-differentiable regions of parameter space. Any understanding of such networks must therefore carefully take into account their non-...
متن کاملPath-Normalized Optimization of Recurrent Neural Networks with ReLU Activations
We investigate the parameter-space geometry of recurrent neural networks (RNNs), and develop an adaptation of path-SGD optimization method, attuned to this geometry, that can learn plain RNNs with ReLU activations. On several datasets that require capturing long-term dependency structure, we show that path-SGD can significantly improve trainability of ReLU RNNs compared to RNNs trained with SGD...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Machine Learning for Modeling and Computing
سال: 2020
ISSN: 2689-3967
DOI: 10.1615/jmachlearnmodelcomput.2020034126